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We describe a simple method for certifying that an experimental device prepares a desired quan- 
tum state p. Our method is applicable to any pure state p, and it provides an estimate of the 
fidelity between p and the actual (arbitrary) state in the lab, up to a constant additive error. The 
method requires measuring only a constant number of Pauli expectation values, selected at random 
according to an importance-weighting rule. Our method is faster than full tomography by a factor 
of d, the dimension of the state space, and extends easily and naturally to quantum channels. 
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In recent years there has been substantial progress in 
preparing many-body entangled quantum states in the 
laboratory [1]. A key step in such experiments is to verify 
that the state of the system is the desired one. This can 
be done using quantum state tomography, or techniques 
such as entanglement witnesses [2]. However, in many 
cases these solutions are not fully satisfactory. Tomogra- 
phy gives complete information about the state, but it is 
very resource-intensive, and has difficulty scaling to large 
systems. Entanglement witnesses can be much easier to 
implement, but are not a generic solution since known 
constructions only work for special quantum states. 

Here we propose a new method, direct fidelity estima- 
tion, that is much faster than tomography, is applicable 
to a large class of quantum states, and requires minimal 
experimental resources. Let us first describe the setting 
of the problem. Consider a system of n qubits, with 
Hilbert space dimension d = 2", and let p be the desired 
state, i.e. the state we hope to accurately prepare. We 
make two basic assumptions. First, we assume that p is 
pure. However, we do not assume any additional struc- 
ture or symmetry, so our method goes beyond previous 
work [3, 4] to encompass nearly all of the states of inter- 
est in experimental quantum information science (e.g., 
the GHZ and W states, stabilizer states, cluster states, 
matrix product states, projected entangled pair states, 
etc.) in a unified framework. Second, we assume that we 
can measure n-qubit Pauli observables, that is, tensor 
products of single-qubit Pauli operators; we do not need 
to perform any other operations. Thus our method is 
applicable to any system that is capable of single-qubit 
gates and readout, without needing to rely on 2-qubit 
gates or entangled measurements. 

Our method works by measuring a random subset of 
Pauli observables chosen according to an "importance- 
weighting" rule. Roughly, we select Pauli operators that 
are most likely to detect deviations from the desired state 
p. We use the resulting measurement statistics to esti- 
mate the fidelity F(p,o~), where a is the actual state in 
the lab. Surprisingly, although there are 4 n distinct Pauli 
operators, we only need to sample a constant number of 
them to estimate F(p, a) up to a constant additive error, 



for arbitrary a. That is, for every possible state a, with 
high probability over the choice of Pauli measurements, 
we get an accurate estimate of F(p,a). 

Although we measure only a constant number of Pauli 
observables, we need to repeat each measurement many 
times in order to estimate the corresponding expectation 
value. The number of repetitions depends on the desired 
state p. In the worst case, it is 0(d), but in many cases 
of practical interest, it is much smaller. For example, for 
stabilizer states, the number of repetitions is constant, 
independent of the size of the system; and for the W 
state, it is only quadratic in the number of qubits n. 

Even in the worst case, our method requires far fewer 
resources than full tomography, both in theory and in 
practice. We demonstrate this by proving lower bounds 
on the sample complexity of full tomography, and by nu- 
merical simulations. 

Finally, we show an analogous method for certifying 
any unitary quantum channel by estimating the entan- 
glement fidelity. We discuss applications to benchmark- 
ing quantum circuits — as a special case, our method can 
certify Clifford circuits in constant time, independent of 
the number of qubits and gates. 

Fidelity Estimation. The fidelity between our desired 
pure state p and the actual state o~ is given by [5]: 



F(p, < 7) = (tr[(^a^) 1 / 2 ]) 2 = tr(p ( 7). 



(1) 



We can write tv(pa) in terms of the Pauli expectation 
values of p and a. Let Wk (k = l,...,d 2 ) denote all 
possible Pauli operators (n-fold tensor products of /, o x , 
o~ y and o~z). Define the characteristic function Xp(fc) = 
tr(pWk/Vd), and note that 



tr(pcr) = ^2x P {k)x<y{k). 



(2) 



In general, Eq. (2) involves the expectation values of 
all d 2 Pauli operators. However, it is easy to see that 
in certain cases fewer Pauli operators are required. For 
example, if p is a stabilizer state, Xp{k) takes on values 
of ±l/y/d at the d points in the stabilizer group of p, and 
vanishes everywhere else. So the sum in (2) contains only 



2 



d terms, and one can compute tr(ptr) by measuring only d 
Pauli operators. Furthermore, to merely estimate tr(pcr) 
one only needs to measure a small random subset of these 
Pauli operators. We will now generalize this strategy to 
work with an arbitrary pure state p. 

We will construct an estimator for tr(per) as follows. 
Select k e {1, . . . , d 2 } at random with probability [6] 



Pr(fc) = ( X p(fc)) S 



(3) 



By measuring the expectation value of the Pauli observ- 
able Wfc, we can estimate Xaik), up to some finite pre- 
cision which we will discuss later. For the time being, 
let us suppose we can measure Xa{k) perfectly. We then 
construct the estimator 



X = xAk)/x P (k). 



(4) 



It is easy to see that EX = tr(pcr) (where E denotes the 
expected value over the random choice of k). 

Now say we want to estimate tr(pa) with some fixed 
additive error e and failure probability 6. We repeat 
the above process I = |~l/(e 2 <5)] times: we choose 
ki , . . . , kg independently, which give independent esti- 
mates X\, . . . , Xi, and we let Y = \ Yli=i Xi- By Cheby- 
shev's inequality [7], Y satisfies 



Pr[|y-tr(pcr)| > e) < 8. 



(5) 



To complete the description of our method, we show 
how the ideal "infinite-precision" estimator Y can be ap- 
proximated by an estimator Y that uses a finite number 
of copies of the state a. Given any choice of k\,...,ki, 
we proceed as follows. For each i = 1, . , . ,1, we will use 
m% copies of a, where we set 



d Xp (k z )He 2 



log(2/<J) 



(6) 



(Note that depends on k{.) We measure the Pauli 
observable on each of these copies of a, and get mea- 
surement outcomes Aij G {1,-1} (j = 1, ...,mi). Note 
that EAjj = Vdx<r{ki) (taking the expectation over the 
random measurement outcomes). Let 



Xi 



miVdxp{ki) jr{ 



(7) 



Finally, we let Y = \ £ l=1 X t . This is our estimate for 
Y. (Note that EY = Y.) By Hoeffding's inequality [7], 
Y has additive error e and failure probability 5: 



Pv[\Y-Y\ >e]<6. 



(8) 



We can then conclude that, with probability > 1 — 26, 
the fidelity F (p, a) lies in the range [Y — 2e, Y + 2e]. 

Our method uses i = [l/(e 2 <5)] Pauli observables, in- 
dependent of the size of the system. It requires m copies 



of the state a, where m = X^=i TOj. Though this depends 
on the random choices ki, we have 

EK) = EWfci)) 2 ™, < 1 + ^ log(2/5), (9) 



and hence the expected number of copies satisfies 



E(m) < 1 + _ 



f 



^log(2/5). 



(10) 



By Markov's inequality, m is unlikely to exceed its ex- 
pectation by much: Pr(m > £-E(m)) < 1/t, for all t > 1. 

Example: the W state. Suppose our desired state p is 
the W state, i.e. the uniform superposition over compu- 
tational basis states where a single qubit is |1) and the 
rest are |0), as previously considered in [3, 4]. To apply 
our method, we need to sample Pauli operators from the 
probability distribution (3). It is straightforward to give 
a short formula for these probabilities, and an explicit 
algorithm that does the sampling in poly(n) time [7]. 

The distribution for a W state is quite different from 
what one would expect for a Haar-random quantum 
state. For a random state, one expects most of the Pauli 
matrices to occur with probability ~ 1 /d 2 ; but for the W 
state, most of the Pauli matrices have probability 0, and 
all the nonzero probabilities are at least l/n 2 d. This is 
an example of a well-conditioned state. As we now show, 
our method requires fewer resources for such states. 

Well- conditioned states. We say that a state p is 
well-conditioned with parameter a if for all k, either 
ti(pWk) = or |tr(pW / fc)| > a. For example, stabi- 
lizer states (including the GHZ state) and the W state 
are well-conditioned with a = 1 and a = 1/n, respec- 
tively, and Dicke states with k excitations have a = 
0(l/n k ). When p is well-conditioned, our method re- 
quires a smaller number of measurement settings, as well 
as fewer copies of the actual state a. Note first that 
the estimator X is bounded: \X\ < 1/a. Now we can 
use the stronger Hocffding inequality for Eq. (5), and we 
can choose the number of measurement settings to be 
£ = O ( lo ^/^ ) . Thus, the dependence on 5 is exponen- 
tially better, at a cost of a factor of 1/a 2 . 

The total number of copies used in the procedure, to, 
is bounded in expectation by (10). For well-conditioned 
states, we can prove a much stronger bound that holds 



with certainty: < 1 



21og(2/<5) 
a 2 £e 2 



, and hence m < 



Q/ iog(i/a) \ j n p ar ^j cu ^ arj wnen p is a stabilizer state, 
m is independent of the size of the system; when p is the 
W state, m is only quadratic in the number of qubits n. 

Truncating bad events. For an arbitrary pure state p, 
it is possible to modify our protocol so that m is always 
bounded by 0(^tj + )■ The idea is to construct 

a nearby p' which is well-conditioned with a = 0(1/Vd), 
by truncating small values of Xp(fc). This eliminates the 
bad choices of k that cause to to be large, at the expense 
of introducing a small bias into the fidelity estimate [7]. 
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Dephasing and depolarizing noise. Our method also 
performs better if one makes some mild assumptions 
about the noise in the system. For an arbitrary pure 
state p, suppose the actual state a is given by a = £{p), 
where £ is some quantum process that shrinks the char- 
acteristic function, i.e., for all k, \xe(p)(k)\ < IXp(^)|- 
For example, dephasing and depolarizing noise both do 
this. Again, this implies that \X\ < 1, hence we can use a 
smaller number of measurement settings, £ = O ( log< £/ S ^ ) . 

Comparison with full tomography. We have shown 
that it is possible to estimate the fidelity of an arbitrary 
pure state using Pauli measurements on 0(d) copies of 
the state. (In this discussion, let us fix the accuracy e and 
failure probability 5 to be constant.) How good is this 
result? We argue that our protocol is more efficient than 
full tomography by a factor of d. By tomography, we 
mean any procedure that distinguishes arbitrary quan- 
tum states with accuracy A, so that for every pair of 
states p and a with F(p, a) < 1 — A, the procedure re- 
turns different outputs for p and a. 

First, as a toy example, consider what is possible us- 
ing arbitrary quantum operations. Fidelity estimation 
of a pure state can then be done with O(l) copies us- 
ing the swap test [8], while full tomography requires 
Sl(<i/ poly log d) copies, by Holevo's theorem [9] (see [7]). 

In the more realistic situation where only Pauli mea- 
surements are allowed (and one cannot perform joint 
measurements on more than one copy of the state), fi- 
delity estimation uses 0(d) copies. We now prove that 
full tomography requires at least £l(d 2 / logd) copies. The 
idea of the proof is as follows (details in [7]). First, we 
construct a set of quantum states \<fii) that are al- 
most orthogonal (for all i ^ j, \(4>i\(f>j)\ 2 < 1 — A), and 
whose Pauli expectation values are small (for all i and k 
with W k ^ I, \{4>i\W k \(j}i)\ < r^/ToJd/Vd). (This is done 
using repeated applications of Levy's lemma [7, 10].) 

Now suppose there is some tomography procedure that 
can distinguish these states, given t copies. This im- 
plies the existence of a classical protocol for transmitting 
Q(d) bits of information over a particular noisy channel 
£. Intuitively, Bob encodes an Sl(<i)-bit message i by 
sending a string of ±1 bits through the channel £, in 
such a way that when Alice receives these bits, they have 
the same distribution as the measurement outcomes she 
would have obtained by measuring Pauli observables on 
the state \4n). Then Alice uses the tomography procedure 
to reconstruct |</>j) and extract the message i. One can 
show that the channel £ has capacity O ((log d)/d) (even 
allowing feedback from Alice to Bob) [11]. Then the con- 
verse to Shannon's (classical) noisy coding theorem [11] 
implies that t > fl(d 2 / log d). 

Numerics. In order to evaluate how tight our analysis 
is for typical states, we simulated our protocol as follows. 
We sampled Haar-random states of n = 8 qubits and ran 
our protocol with e = S = .05 (and £ = ^fj) where the 
true state was created by subjecting the ideal state to 



n=3 quoits, e =0.05, 5 = 0.05, depolarizing noise =10% 
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FIG. 1. Left: The residual error has a standard deviation 
of 1.8%. Right: Most states use only a typical number of 
copies, with just .1% of trials using more than four times the 
expected number of copies, as shown in the inset. 

independent 10% depolarizing noise. The residual error 
(Y — F) and the total number of copies m are plotted as 
histograms in Fig. 1. We see that the accuracy is always 
well-behaved, and the total number of copies, excepting 
a few bad events (for which our truncation procedure 
applies) is typically close to the average. 

We also compared our method to a recent ion trap ex- 
periment, in which an 8-qubit W state was verified using 
full tomography [12]. Under the plausible assumption 
that dephasing noise is dominant, we would use our pro- 
tocol with e = .03, 6 = .10, and £ = [log(l/<5)/e 2 ] . As- 
suming the realistic parameters of 20 ms to perform one 
measurement and 400 ms to reconfigure a new measure- 
ment basis, we would obtain a fidelity estimate accurate 
to within ±1.2% using just 80 minutes of experiments 
and a few seconds of classical processing; this compares 
very favorably with the 10 hours of experiments and one 
week of post-processing carried out in [12]. 

Extension to channels. We now extend our method to 
unitary quantum channels. Let U be the desired channel 
corresponding to some unitary evolution U, i.e., U : p h> 
UpW . Let £ be the actual channel. We will estimate the 
entanglement fidelity, given by F e = tr(W^ £) / 'd 2 (with U 
and £ treated as matrices acting via left multiplication) . 

Most of the analysis for channels is exactly analogous 
to the case of states. The main difference is that we may 
also input a state to the channel as well as choose how to 
measure at the output. Thus, the characteristic function 
for a channel £ is defined by \£ (&> k') = -j tr(Wk£(Wk> )) , 
which depends on two indices. The probability dis- 
tribution from which we sample indices is analogous: 

Pr(/c,fc') = [xw(&; j an d so i s our primary es- 
timator: X — Xf(fc, fc')/x^(fc, fc'), for which we have 
EX = F e . Now given £ independent samples from our 
probability distribution (k\, k[), . . . , {kg, k'g), we compute 
Xi, . . . ,Xg, and let Y = |X^=i^v Then choosing 
t = \l/(e 2 S)~\ means that Y is an estimate of F e which is 
accurate to within e with a failure probability at most S. 

The main difference between states and channels comes 
in how we estimate Xi for a given sample (ki, k[). We will 
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still measure at the output, but how can we simulate 
inputing Wy. into the channel? The key insight is that 
we can simply sample from states in the eigenbasis of 
and put these states into the channel. Note that these 
states can always be chosen to be tensor products of local 
Pauli eigenstates, so no entangling gates are required. 

The total number of uses of the channel is bounded in 
expectation by E(m) = 0(-^ + ^ log(l/<5)). Statements 
about well-conditioned channels and truncation also hold 
in analogy with states [7] . 

Benchmarking quantum circuits. One application of 
the above protocol is to evaluate experimental implemen- 
tations of large quantum circuits: our method allows one 
to directly measure the entanglement fidelity and aver- 
age fidelity of the entire circuit, rather than inferring it 
from tomography performed on individual gates. This is 
important because as circuits scale up, correlated noise 
potentially becomes an issue (c.f. Ref. [13]). 

The relationship between F e and the Haar-average fi- 
delity is captured by the formula [14] 

^av g = J df/>F(U(1>), Sty)) = ^F e + glj . (11) 

Thus, our method also gives us a direct measure of the 
typical performance of the channel, similar to what is 
achieved in other random benchmarking schemes [15-17]. 
Moreover, one can also prove that the worst-case behav- 
ior (as quantified by the diamond norm [18]) is bounded 
by 4d\/l — F e [19], so that for small high-fidelity gates, 
average and worst-case behavior nearly coincide. 

Clifford circuits. Clifford circuits (those consisting 
of controllcd-NOT, Hadamard and phase gates) are 
key components in many schemes for quantum error- 
correction, and become universal for quantum com- 
putation when augmented with certain state prepara- 
tions [20, 21]. For a Clifford circuit U, the characteristic 
function is given by xu{k,k') = 1 (when Wk = U(Wk')) 
and otherwise. Sampling only requires that we pick 
k' £ {l,...,d 2 } uniformly at random, then use the 
Gottesman-Knill theorem to efficiently compute the k 
such that Wfe = U(Wk')- Clifford circuits are well- 
conditioned, so our method needs fewer measurement set- 
tings and uses of the channel, namely I < 0(j? log(l/<$)) 
and m < 0(p- log(l/<5)), which is independent of the 
number of qubits and gates (see also Ref. [22]). 

Outlook. We have presented a general method for cer- 
tifying pure states and unitary quantum channels, which 
requires only Pauli measurements and is faster than full 
tomography by a factor of d. In common cases such as 
stabilizer states, the W state, and Clifford circuits, our 
method requires even fewer resources (constant or poly- 
nomial in the number of qubits), and it provides an easy 
recipe to generalize beyond these examples. 

Looking beyond fidelity estimation, it would be inter- 
esting to directly estimate and bound an entanglement 



measure [23] , which would obviate the need for an entan- 
glement witness. One may also compare our method with 
recent proposals for tomography for restricted classes of 
quantum states [24-27]. Another important direction is 
to find better techniques for sampling the importance- 
weighting distribution Pr(fc) for different classes of states. 

We thank D. Gross, J. Preskill and T. Monz for helpful 
discussions. YKL was supported by NIST Grant No. 
60NANB10D262, and STF by NSF Grant No. PHY- 
0803371 and ARO Grant No. W911NF-09-1-0442. 

We would also like to note that M. da Silva, O. Landon- 
Cardinal and D. Poulin [28] have recently and indepen- 
dently obtained similar results. 
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Bounding the Failure Probabilities 

To show Eq. (5), observe that the variance of each 
individual estimator X, is not too large, 

Var(X l ) = E(Xf) - (EX Z ) 2 (12) 

= J2lxAk)} 2 -M P a)} 2 (13) 

k 

= tr^ 2 ) - Mpa)f < 1. (14) 

This implies that Var(y) <l/i. Hence, by Chebyshev's 
inequality, 

1 



Pr 



\Y-ti(pa)\ > x/Ve 



< 



A 2 ' 



(15) 



Then set A = 1/VS and I = [l/(e 2 <5)l ■ 

To show Eq. (8), we use Hoeffding's inequality, which 
says that for all e > 0, 



where 



Pr[|y-y|>e] < 2exp(-2e 2 /C), 



^ = EE( 2c * 

»=i j=i 



£miVdxp{ki) 



Setting rrii as in Eq. (6), we get 



C = E 



< 



2e 2 



^Prrndxpih) 2 ~ log(2/5)' 
hence the failure probability is < 8, as claimed. 

Efficient sampling for the W state 

Recall the definition of the W state, 
\W) = -L l b > . 



(16) 



(17) 



(18) 



(19) 



where the sum is over all n-bit strings b with Hamming 
weight |b| = 1. We can factor any n-qubit Pauli operator 
into a tensor product of local Pauli a x operators times 
a tensor product of local Pauli a z operators (up to an 
irrelevant phase). Our probability distribution follows 
from the definition in Eq. (3), 

p(j,k) = Pr(oioJ) = - d \(W\aia*\W)\ 2 , (20) 

where we denote the tensor product by a bit string in the 
exponent. (Thus, for example, a* 10 ^ 11 
up to an irrelevant phase.) 



P(j,k) = -\- 

n z d 



E <"l°i<£|b) 

|a| = |b| = l 



1 

n 2 d 



E 

|a|=|b|=l 



(-1) 



b-kr 

°a,b+j 



(21) 
(22) 



where the arithmetic in the delta function is modulo 2. 
The delta function tells us that the tensor product over 
a x must only contain either or 2 factors of a x only; all 
other terms have zero probability. Let's separate out the 
case where there are no a x operators from when there 
are two. If there are none, then 



P (o,k) = 4^ 



|b|=i 



(-1) 



bk 



1 

n 2 d 



i=l 



{-If 



4^("- 2 l k D 2 



(23) 



(24) 



If j has weight 2, then the summand reduces to only two 
terms, since flipping two bits in the weight- 1 string b 
will (with two exceptions) increase the weight, making it 
orthogonal to the weight- 1 string a. 



1 



E (-D 

|a|=|b|=l 

i-k\2 



b-kr 

d a,b+j 



^ 1 + (- 1 ) J 



(25) 



(26) 



This is clearly either or A/n 2 d depending on j • k mod 
2. To summarize, we have the following formula for the 
probabilities 



P(j,k) 







|k|) 2 if j = 

if |j| = 2,j-k = (27) 
otherwise, 



where again, the dot product j • k is taken mod 2. 

Given this formula, we have the following simple pro- 
cedure to sample from this distribution. The procedure 
consists of two steps. First, flip a weighted coin to see 
if you are in the first or the second branch. The total 
weight in the first branch (with j = 0) is 1/n, a fact 
that follows from some simple binomial identities, or by 
directly computing the weight in the second branch. If 
we are in this first branch, then all strings k of a given 
Hamming weight are equally probable. We can sample 
from this by first picking the weight w = |k| from the 
normalized distribution 

q(w)^^-( n ){n-2w) 2 . (28) 
nd\w/^ ' 

Since this distribution only has n outcomes, we can sam- 
ple from it efficiently in n. Then we just choose a random 
bit string with the given sampled weight. Now consider 
that we are in the second branch after the initial coin 
flip. Then we choose uniformly from all (™) bit strings 
of length n containing exactly two ones, and this defines 
j. Then we pick a bit string k by choosing a uniformly 
random bit string of length n — 1 , and we take (say) the 
first bit and copy it between the two sites in k which are 
supported by j to enforce the condition j • k = mod 2 
and distribute the remaining random bits over the rest 
of k sequentially 
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Truncating Bad Events 

The modified procedure is as follows: construct a new 
state pi by defining its characteristic function to be 



XpiO) 




if \x P (k)\ >P/d, 
otherwise. 



(29) 



Define p 2 = pi/\\pi\\2, where ||p|| 2 = y/tr(p 2 ) is the 
Schatten 2-norm. Then perform our original certification 
procedure using p 2 , to estimate tT:{p 2 <r). Actually, note 
that pi may not be a density matrix (it may not be pos- 
itive semidefmite with trace 1); nonetheless, it satisfies 
tr^ 2 ,) = 1, so the certification procedure makes sense. 

We can bound m as follows: note that, for all fc, either 
X P2 (fc) = 0, or \ Xp2 (k)\ > \XpAk)\ > P/d. Then, with 
probability 1, we have m, < 1 + \og(2/8) and m < 

l + ^ + ^log(2/5). 

We claim that tv(p 2 a) gives us an estimate of tr(pcr), 
with some bias that is not too large. Clearly, 



|tr(p 2 cr) - tr(pcr)| < \\p 2 - ph 



(30) 



and the quantity on the right-hand side can be calculated 
explicitly, given knowledge of p. In the worst case, we 
claim that ||/?2 — p\\2 < 2/3. To see this, note that \\p\ — 
ph < P: and 1 - P < \\pih < !■ ncncc \\P2 - Pih < P- 



Lower Bound for Tomography 

As a toy example, consider the situation where we can 
perform arbitrary quantum operations. In this setting, 
full tomography of a pure state with constant accuracy 
requires at least fi(d) copies (up to log factors); this fol- 
lows from the existence of sets of 2 n ^ almost-orthogonal 
pure states [8], and Holevo's theorem [9]. Note that this 
lower bound is tight: full tomography can be done with 
0{d) copies, by using random POVM measurements [29, 
Thm. 3] to perform state discrimination on an e-net of 
pure states [30, Lemma II. 4]. 

Now consider the more realistic situation where only 
Pauli measurements arc allowed. We prove that 
full tomography using Pauli measurements requires 
Q(d 2 / loge?) copies of the state. 

First step. We want to construct a large set of nearly- 
orthogonal quantum states that have small Pauli expec- 
tation values. To do this, we will use the following lemma: 

Lemma 1. Fix any states \<j>i), ■ ■ ■ , \<t> s ) € C d , where 
s < 2 cd and c is some constant. Then there exists a state 
|V') G C d such that: 



Vie {!,...,«}, \(cPi\ij)\ <e, 



(31) 



VWfe^J, \{ip\W k \ip)\<T-y/]ogd/Vd. (32) 



The proof of the lemma is as follows. Choose ip to be 
a Haar-random vector in S"* -1 . We claim that (31) and 
(32) are satisfied with high probability. 

First, for each i, observe that (4>i\ip) is a smooth func- 
tion of ijj, with Lipschitz coefficient ?/ = 1: 

1(0^)- W>| < l^-^'lla. (33) 

By symmetry, ~E(<fii\ip) = 0. So by Levy's lemma [10], 

Pr[|(^#)| > e] < 4cxp(-C lC fc 2 A? 2 ), (34) 

where Ci = 2/97T 3 . Taking the union bound over all i, 
we get that 



Pr[Eq. (31) fails for some i] 

< 4exp(cd(log2) - Cicfe 2 ) 
= 4exp(-cd(log2)) = 4 • 2~ cd . 



(35) 



Next, for each k, observe that (V|Wfc|V) is a smooth 
function of -0, with Lipschitz coefficient r/ = 2: 

\(i>\w k \ip) - W\w k W)\ 

< | (V| W k [|V) - |V')] | + | [(1>\ - W\] W> | (36) 
<2||V-V'I| 2 . 

By symmetry, E(V|W / fc|V) = 0. So by Levy's lemma [10], 



Pr 



|(v|w fc |V)| > T^d/Vd 



< 



4exp(-C 1 r 2 (logd)/7 ? 2 ), 



(37) 



where C x = 2/9?r 3 . Taking the union bound over all fc, 
we get that 



Pr[Eq. (32) fails for some k] 

< 4exp(21ogd- C*iT 2 (logd)/4) 
= 4exp(-21ogd) = A/d 2 . 



(38) 



Here e = -\/97r 3 (log 2)c and r = v / 727r 3 . 



This proves the lemma. 

By applying the above lemma repeatedly, we can con- 
struct a set of 2°( d ) quantum states \4>i) that are almost 
orthogonal (for all i ^ j, \(4>i\4>j)\ 2 < 1 — A), and whose 
Pauli expectation values are small (for all i and k with 
W k I, 1(0*1^1^)1 < T^h^d/Vd). 

Second step. Suppose there is some tomography pro- 
cedure that can distinguish among the states \4>i), given 
t copies. We now construct a classical protocol for trans- 
mitting tt(d) bits of information over a particular noisy 
channel £ . 

Let £. be the classical channel that takes a bit b e 
{1,-1} and outputs a bit b' £ {1, —1}, where with proba- 
bility Ty/\ogd/ \/~d, the channel sets b 1 = b, and with prob- 
ability 1 — T^/log dj \fd, the channel chooses b' € {1,-1} 
uniformly at random. Using the tomography procedure, 
we will show how to send messages over this channel (to- 
gether with a noiseless feedback channel). 
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Say Bob wants to send 0(d) bits to Alice. He associates 
the message with a state \<f>i). Alice runs the tomogra- 
phy procedure. When she wants to measure some Pauli 
matrix W k , she sends k to Bob (over the noiseless feed- 
back channel). Bob chooses a random b € {1,-1} with 
expectation value (4>i\W k \4>i) ■ Vd/ry/log d, and sends b 
through the channel £ to Alice. Alice receives b' , which 
has expectation value {4>i\Wk\4>i) . 

For tomography using t copies, Bob sends t bits 
through the channel £ (in addition to the feedback bits 
sent by Alice). But £ is simply the binary symmet- 
ric channel, which has capacity < r 2 (log d)/d. Further- 
more, feedback does not increase its capacity [11]. So, by 
the converse to Shannon's (classical) noisy coding theo- 
rem [11], Bob must use the channel at least Vl(d 2 / \ogd) 
times to send Q(d) bits. Hence t > 0(d 2 / \ogd). 



Estimating entanglement fidelity for channels 

Here we give a detailed description of our method for 
certifying quantum channels. Let C^ d denote the set of 
Hermitian matrices in C dxd . We will view C^ xd as a vec- 
tor space, with Hilbert-Schmidt inner product tr(A'l'B). 
We use round bra-kets to denote this: |^4) is a vector, 
(S| is an adjoint vector, and (^4|-B) = tr(A^B) is an 
inner product. 

Let £(C^ xd ,C^ xd ) be the vector space of all linear 
maps from C^ xd to C d ^ d , again with Hilbert-Schmidt 
inner product tr(„4^£>). Now recall the Pauli matrices 
| Wife) € C d H xd (k = l,...,d 2 ). Note that \\W k )(W k ,\ 
(k,k' G {l,...,e? 2 }) form an orthonormal basis for 
C(C d H xd ,C d H xd ). For any channel £ e £{C d H xd ,C dxd ), we 
define its characteristic function to be 



X£ (k,k') = tr \\W k )(W k 



= \{W k \£\Wy) = ±tr(Wl£(W k ,)). 



(39) 



(Note that xs(^^') 1S real, since W k and £(W k i) are 
Hermitian.) Then 



k, k < 



Xe(k,k')\W k )(W k , 



and the overlap between li and £ is given by 



tr(W T £ 



k,k' 



Xu{k,k')xs{k,k'). 



(40) 



(41) 



Note that for any channel £, < tr(£^£) < d 2 , and 
since U is a unitary channel, ti:(U^U) = d 2 . This im- 
plies |tr(^£)| < d 2 . We will be interested in estimating 
tv(U^£)/d 2 up to an additive error of size e. 

We will construct an estimator for tr(U^£)/d 2 as fol- 
lows. Select (k, k') <E {1, . . . , d 2 } 2 at random with prob- 
ability 

1 



Pr(k,k') = -^[xu(k,k')] 



(42) 



(Note that these probabilities are normalized, since 
tr(WU) = d 2 .) We can estimate xs{k, k'), up to some fi- 
nite precision, by preparing cigenstates of W k > , applying 
the channel £ , and then measuring the observable W k ; we 
will discuss this below. We then compute the quantity 



X = X e(k,k , )/xu(k,k'). 



(43) 



It is easy to see that EX = tr(W£)/d 2 . 

We want an estimate with additive error e and fail- 
ure probability 5, so we repeat the above process £ = 
\l/(e 2 6)~\ times: we choose (ki, k[), . . . , (kg, k' e ) indepen- 
dently, which give independent estimates X%, . . . , Xg , and 
we let Y = |Ei=i-^i' (Note that the number of Pauli 
observables £ is independent of the size of the system.) 
By Chebyshev's inequality, 



Pr[|Y -tT(U f £)/d 2 \ > e]< 5. 



(44) 



Finally, we describe how to estimate Y from a finite 
number of uses of the channel £. Fix any choice of (ki, k^) 
for i = 1, . . . , I, We then estimate Y as follows. For each 
i = !,...,£: 

• Choose some eigenbasis for the Pauli matrix W k >. , 
call it (a = 1, . . . , d), and let A J, G {1,-1} be 
the corresponding eigenvalues. (Note that one can 
choose the \<p a ) to be tensor products of single-qubit 
Pauli eigenstates.) 

• Let 



Xu(h,k') 2 £e 2 



log(4/5) 



(45) 



• For each j = 1 , . . . , m, : choose some £ 
{1, . . . , d} uniformly at random, prepare the state 
\4> % ai .), apply the channel £ , and measure the Pauli 
observable W ki , to get an outcome Aij <G {1,-1}; 
finally, let By = A,', .1, ,. 

Note that 

d 

®Bij = \ K tj te{W ki £{W aij W aij I)) 

o„=l ( 46 ) 

= \tx{W ki £{W k ^) = xs{k i ,k' i ). 



Let 



^ mi 

~ Xuikuk'^rm 



(47) 



Finally, we let Y = \ Jj i=1 X t . This is our estimate for 
Y; note that EY = Y. By Hoeffding's inequality, 



Pr[|Y-F| > e] < S. 



(48) 
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This procedure uses the channel £ a total of m times, 
where m = X^f=i TO «- This number depends on the ran- 
dom choice of (fcj, k'j) (i = 1, . . . ,£). We can bound it in 
expectation: we have 

Ad 2 

E(m,) = Y, ^XuihXfmi < 1 + log(4/<5), (49) 



k,,k' 



hence 



1 Ad 2 
E(m) < l + _ + _log(4/<J). 



(50) 



Then use Markov's inequality: Pr(m > t ■ E(m)) < 1/t, 
for all t > 1. 

It remains to prove (44) and (48), bounding the failure 
probability. To show (44), note that the variance of each 
Xi is not too large: 



Vax(Xi) = E(Xf) - (EX,) 2 

= Ei^(fc,fc') 2 -jrtr(^) 2 



k,k> 



(51) 



ltr(^)_^tr(Wtf) 2 <l. 



Then Var(Y") < l/£, so by Chebyshev's inequality, 

PT[\Y-(l/d 2 )tr(rf£)\>±] <£. (52) 

Now set A = 1/VS and t = \l/(e 2 6)~\. 

To show (48), we use Hoeffding's inequality: for any 

e > 0, 

Pr [\Y -Y\>e]<2 cxp(-2e 2 /C), (53) 

where 



l mi 



i=l j=l 



We have 



1 A 

c = E -~ -— - < 



^ 2 Xw(fci,fc<) 2 ™i " log(4/<5) : 
hence the failure probability is < S as desired. 



(54) 



(55) 



